Skip to content

Introduction of parameterizable workgroups for compute shaders and dispatch sizes for the renderer #31402

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: dev
Choose a base branch
from

Conversation

Spiri0
Copy link
Contributor

@Spiri0 Spiri0 commented Jul 13, 2025

Related issue: #31393

So far, the workgroup for the compute shader consists of a scalar and the dispatchSize always uses dispatch.x up to device.limits.maxComputeWorkgroupsPerDimension before dispatch.y is used. The default workgroup is now [ 64, 1, 1 ] instead of [ 64 ]. This ensures that all previous examples are treated as before.

Example on webgpu_compute_particles_rain, usual usage:

computeParticles = computeUpdate().compute( maxParticleCount );
renderer.compute( computeParticles );

Use with workgroup and dispatchSize

computeParticles = computeUpdate().compute( [ 16, 16, 1 ] );
renderer.compute( computeParticles, [ 10, 10, 1 ] );

Since the instanceCount in the example is 25,000, workgroup = [ 16, 16, 1 ] and dispatchSize = [ 10, 10, 1 ] = 16 * 10 * 16 * 10 * 1 = 25,600 threads, enough to cover everything. If you choose smaller values for dispatchSize, e.g., [ 9, 9, 1 ], you would see that not all raindrops are computed.

You now have full control over the thread distribution for the GPU, which was not possible.
I'm setting the PR to draft for now, even though it works, because I might think of a few more things I'd like to add. Maybe warnings if you enter something incorrectly.

Copy link

github-actions bot commented Jul 13, 2025

📦 Bundle size

Full ESM build, minified and gzipped.

Before After Diff
WebGL 338.28
78.93
338.28
78.93
+0 B
+0 B
WebGPU 558.52
154.58
559.33
154.89
+803 B
+311 B
WebGPU Nodes 557.45
154.37
558.25
154.67
+803 B
+309 B

🌳 Bundle size after tree-shaking

Minimal build including a renderer, camera, empty scene, and dependencies.

Before After Diff
WebGL 469.62
113.66
469.62
113.66
+0 B
+0 B
WebGPU 634.18
171.62
634.98
171.95
+803 B
+331 B
WebGPU Nodes 589.31
160.95
590.11
161.27
+803 B
+316 B

@Spiri0 Spiri0 marked this pull request as draft July 13, 2025 07:10
@Spiri0 Spiri0 force-pushed the workgroups_and_dispatch_extension branch from 73116e1 to 910ce62 Compare July 13, 2025 09:38
@Spiri0 Spiri0 marked this pull request as ready for review July 13, 2025 10:03
Attila Schroeder added 2 commits July 14, 2025 03:27
Comment on lines 228 to 235
let count = countOrWorkgroupSize;

if ( Array.isArray( countOrWorkgroupSize ) ) {

workgroupSize = countOrWorkgroupSize;
count = null;

}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the project agrees with the API change I'm wondering if we should deprecate the old code path where "count" is passed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've set it up so that count is used if no workgroup is specified, so that existing user code and examples continue to run as before. But yes, I would also be in favor of removing the count from the code in the long term, with the usual transition time. This also significantly simplifies the compute call in the backend.

Comment on lines 1327 to 1329
const isValid = dispatchSize[ 0 ] > 0 && dispatchSize[ 1 ] > 0 && dispatchSize[ 2 ] > 0;

dispatchSize = isValid ? dispatchSize : computeNodeData;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's valid to set @workgroup_size to 1 or 2 values when defining the parameters inline with the remaining dimensions implicitly being set to "1" (see the spec here) so it would be nice to allow for providing workgroup size of 1 or 2 to align with the native behavior and for the sake of ergonomics:

const computeShader = wgslFn( `fn fragmentShader() -> void {}` );

// all are equivalant
const kernel = computeShader().compute( [ 16 ] );
const kernel = computeShader().compute( [ 16, 1 ] );
const kernel = computeShader().compute( [ 16, 1, 1 ] );

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know that from the W3C documentation, I'll take a look at it, thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be implemented in two ways. Since it makes no difference in performance, I prefer to check the length of the workgroupSize array before calling the wgsl code builder to fill it with ones if smaller than a three-valued array

this.computeShader = this._getWGSLComputeCode( shadersData.compute, ( this.object.workgroupSize || [ 64 ] ).join( ', ' ) );
const workgroupSize = this.object.workgroupSize || [ 64, 1, 1 ];

if ( workgroupSize.length !== 3 ) throw new Error( 'workgroupSize must have 3 elements' );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise dispatchWorkgroups can also take 1 or 2 parameters with the remainder defaulting to 1:

// all equivelant
renderer.computeAsync( kernel, [ 10 ] );
renderer.computeAsync( kernel, [ 10, 1 ] );
renderer.computeAsync( kernel, [ 10, 1, 1 ] );

I'm curious to hear other opinions

Copy link
Contributor Author

@Spiri0 Spiri0 Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's simple. I can do it tonight ( CET ). If you have any other requests by then, I can include them as well

@@ -2351,6 +2352,38 @@ class Renderer {

}

if ( dispatchSize !== null ) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave this just to compute() node, so we avoid duplicate code.

Copy link
Contributor Author

@Spiri0 Spiri0 Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you mean move the entire validation to the compute() and just pass the dispatchSize through in the Renderer like before?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's better to just use the workgroupSize from the compute() node itself. I would also suggest reverting to the default value [ 64 ], and removing support for non-array numbers. Then we can add these details in the js-doc about auto-fill by 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting to play with the values. I like to use the webgpu_compute_particles_rain example

//usual
184: computeParticles = computeUpdate().compute( maxParticleCount );
355: renderer.compute( computeParticles );

//with workgroups and dispatchSize
184: computeParticles = computeUpdate().compute( [ 64 ] );
355: renderer.compute( computeParticles, [ 700 ] );

184: computeParticles = computeUpdate().compute( [ 16, 16 ] );
355: renderer.compute( computeParticles, [ 10, 10 ] );

or with too low values so that not everything is computed

184: computeParticles = computeUpdate().compute( [ 64 ] );
355: renderer.compute( computeParticles, [ 100 ] );

184: computeParticles = computeUpdate().compute( [ 16, 16 ] );
355: renderer.compute( computeParticles, [ 10, 5 ] );

Copy link
Collaborator

@sunag sunag Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user could already do this creating two nodes:

const comp = computeUpdate();

computeParticles1 = comp.compute( [ 64 ] );
computeParticles2 = comp.compute( [ 700 ] );

 renderer.compute( computeParticles1 );
 renderer.compute( computeParticles2 );

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a possible case, but it's not the only one, and I think we'll move further away from render.compute() as we improve TSL. Related: #30768 (comment)

I didn't realize you had changed the function signature regarding dispatch. I would like to suggest a change to the design of this PR.

ComputeNode Function signature compute( dispatchs, workdgroups )

// compute( dispatch, workdgroups )
// compute( numberOrArray, numberOrArray )

nodeFn().compute( [ 1000 ], [ 64 ] )
// or
nodeFn().compute( 1000, 64 )
// or
nodeFn().compute( 1000, [ 64, 1 ] )
// etc

dispatchs or workdgroups can accept number or array as input, both will pass through a function that will resolve the number or array, returning an array of 3 values. This function is shared can be inside ComputeUtils.js.

We don't use thrown but console.error() and try not to crash the renderer completely.

We can go like this too, I agree with you and I think it can be useful

renderer.compute( computeParticles, 700 ); 
// or 
renderer.compute( computeParticles, [ 700 ] ); 

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can go like this too, I agree with you and I think it can be useful

// pass work group size on kernel creation
nodeFn().compute( [ 64 ] );

// pass dispatch count on evaluation
renderer.compute( computeParticles, [ 700 ] ); 

To chime in - my preference is to pass the "dispatch count" separate from work group size when evaluating the compute kernel (whether that the be renderer.compute or whatever future incarnation of the compute API).

Perhaps it's because I'm familiar with the underlying APIs here but my impression is that when calling nodeFn().compute( workGroupSize ) the compute shader must be recreated (or dirty checked somehow) since the work group size values must be inlined in the shader string, which shouldn't be needed when changing the dispatch count. This means for a number of use cases you'd have to rebuild the compute node whenever the screen resizes, which seems wasteful. All to say I think the separation of operations feels valuable and mirrors the WebGPU API.

And thanks to both of you for the work here! The compute API is shaping up quite nicely, I think.

Copy link
Collaborator

@sunag sunag Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for participating too @gkjohnson :)

My concern with workGroupSize be the first parameter like nodeFn().compute( workGroupSize ) is that the compute() is a node, and can be used inside the material, being processed only when the object/material is rendered like this example. We have the same countOfPoints equivalent to the number of instances(a good idea would be to automate based on the number of instances if no value is added), but this may not be the general case. The user could still overwrite the number of dispatches using renderer.compute( computeParticles, 700 ) and maybe using computeParticles.setCount().

In short, compute() can be used without the renderer.compute() call if the user binds to a NodeMaterial, so it needs a number of dispatches.

workGroupSize is also quite advanced for most and I hope we can automate it based on dispatchSize and nodes of Fn call if the user hasn't defined it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also extend it without changing the current functionality like nodeFn().computeInstance( workGroupSize ).

// ComputeNode.js

export const computeInstance = ( workgroupSize ) => compute( null, workgroupSize );

//
const computeParticles = nodeFn().computeInstance( workGroupSize );
renderer.compute( computeParticles, [ 700 ] ); 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize you had changed the function signature regarding dispatch. I would like to suggest a change to the design of this PR.

Ah, that explains my confusion. It's important to be able to change the dispatchSize with each render call. I'm also thinking about dispatchWorkgroupsIndirect, which someone has already requested. I had even started a PR for this but did not continue because I knew that it would immediately lead to a new issue if the person found that the workgroups were not parameterizable.

Once defined, the workgroups are fixed. However, the dispatchSize can be changed in any render call, which I find very valuable. This is important for dynamic computes and also for the mentioned case of dispatchWorkgroupsIndirect which I already have in mind to do,
or just in case that @gkjohnson mentioned someone scales the renderscreen.

const comp = computeUpdate();

//workgroups, the computeNode is built ( riggid part )
computeParticles = comp.compute( [ 16, 16 ] );   

//The existing computeNode can be rendered with different dispatchSize at any time ( dynamic part )
renderer.compute( computeParticles, [ 10, 10 ] );   

Your objection to saving code gave me an idea. The validation conditions for the workgroupSize and the dispatchSize are the same or almost the same. If necessary, a flag can be used to differentiate between them. The validation conditions don't have to be repeated independently in the computeNode and the backend. A validation helper function that we simply call for the workgroupSize and in the backend for the dispatchSize could makes sense. What do you think about?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants